We consider the problem of fitting the parameters of a high-dimensionallinear regression model. In the regime where the number of parameters $p$ iscomparable to or exceeds the sample size $n$, a successful approach uses an$\ell_1$-penalized least squares estimator, known as Lasso. Unfortunately,unlike for linear estimators (e.g., ordinary least squares), nowell-established method exists to compute confidence intervals or p-values onthe basis of the Lasso estimator. Very recently, a line of work\cite{javanmard2013hypothesis, confidenceJM, GBR-hypothesis} has addressed thisproblem by constructing a debiased version of the Lasso estimator. In thispaper, we study this approach for random design model, under the assumptionthat a good estimator exists for the precision matrix of the design. Ouranalysis improves over the state of the art in that it establishes nearlyoptimal \emph{average} testing power if the sample size $n$ asymptoticallydominates $s_0 (\log p)^2$, with $s_0$ being the sparsity level (number ofnon-zero coefficients). Earlier work obtains provable guarantees only for muchlarger sample size, namely it requires $n$ to asymptotically dominate $(s_0\log p)^2$. In particular, for random designs with a sparse precision matrix we show thatan estimator thereof having the required properties can be computedefficiently. Finally, we evaluate this approach on synthetic data and compareit with earlier proposals.
展开▼
机译:我们考虑拟合高维线性回归模型参数的问题。在参数$ p $的数量可等于或超过样本大小$ n $的体制中,一种成功的方法是使用被惩罚的最小二乘估计子\\ ell_1 $,即Lasso。不幸的是,与线性估计器(例如,普通最小二乘法)不同,存在基于Lasso估计器来计算置信区间或p值的成熟方法。最近,一个工作\引文{javanmard2013假设,信心JM,GBR假设}通过构造拉索估计器的无偏版本解决了这个问题。在本文中,我们在设计的精度矩阵存在良好估计的前提下,研究了这种用于随机设计模型的方法。如果样本量$ n $渐近地主导$ s_0(\ log p)^ 2 $,而spar $是稀疏度,则我们的分析对现有技术进行了改进,建立了几乎最优的\ emph {average}测试能力。 -零系数)。较早的工作仅对更大的样本量获得可证明的保证,即需要$ n $渐近地支配$(s_0 \ log p)^ 2 $。特别地,对于具有稀疏精度矩阵的随机设计,我们表明可以有效地计算具有所需属性的估计量。最后,我们在综合数据上评估此方法,并将其与早期建议进行比较。
展开▼